Please note that the file format depends on the length of the submitted PCR amplicons (or DNA fragments).
For PCR amplicons that are longer than 600bp, results are provided in the exact same format as described for our Complete Plasmid Sequencing service. Assuming that the submitted sample contains only one highly abundant PCR amplicon, we try to assemble the NGS reads into one single contig. If multiple amplicons should be present in the submitted sample, however, they cannot share significant sequence similarities as this could make the assembly results extremely difficult to interpret.
For PCR amplicons shorter than 600bp, we try to detect all possible variants that are more frequent than 1%. The results are presented in a format similar to our CRISPR Sequencing results, with slight modifications.
In the following we introduce you to the format of the files that are delivered. For the purpose of demonstration, we will use results for a subset of a public dataset "Danio rerio CRISPR amplicon sequencing set1 MiSeq" (SRR1586614). We will pretend these data are coming from a Complete Amplicon Sequencing order, named 1586614a in our system. The tube ID will be assigned as SRR. We assume that we processed the sample in our run 987654w, with the internal ID ZA12. For each sample, three files will be delivered. These files will be in a folder, in this case named 1586614a_year_month_day_Complete_Amplicon_Sequencing.a FASTQ file:
This file, named SSR_987654w_ZA12.fastq, represents the raw data. You can analyze FASTQ data using any compatible NGS data software. Please note that paired-end reads can be read from a single FASTQ file in which the entries for the first and second read from each pair alternate. The first read in each pair comes before the second.
a FASTA file:
This file, named
SSR_987654w_ZA12_Amplicons.seq, is a text file.
The first few lines are displayed here:
>SSR_ZA12_CONTIG_271_p1 24410 pairs of NGS reads, 47.61% plus 32784 pairs with SNPs TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTA CGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGATTT CGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTC AGCGGAGTGCAGACGGTCTCTCAGGGCCAGAAACTCACGCTGGTAAATT CCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGG TTTTAAAACCCACACTTGTATAGAA >SSR_ZA12_CONTIG_269_p2 7069 pairs of NGS reads, 13.78% plus 11520 pairs with SNPs TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTAC GGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGATTTCG TCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGC GGAGTGCAGACGGTCTCAGGGCCAGAAACTCACGCTGGTAAATGTCCACC ACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAA AACCCACACTTGTATAGAA >SSR_ZA12_CONTIG_267_p3 6482 pairs of NGS reads, 12.64% plus 9740 pairs with SNPs TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTAC GGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGATTTCG TCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGC GGAGTGCAGACGGTCAGGGCCAGAAACTCACGCTGGTAAATGTCCACCAC ATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAA CCCACACTTGTATAGAA >SSR_ZA12_CONTIG_261_p4 2873 pairs of NGS reads, 5.6% plus 3931 pairs with SNPs TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTAC GGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGATTTCG TCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGC GGAGTGCAGACGGCCAGAAACTCACGCTGGTAAATGTCCACCACATTACC TACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACA CTTGTATAGAA
a Multiple Alignment file:
This file, named SSR_987654w_ZA12.seq_aln.txt, is a text file, but you need to open it in a whitespace friendly editor (such as Microsoft Wordpad or any popular web browser). Don't open it in Notepad or Microsoft Word.
Again, only a part of the file is displayed below:
CLUSTAL O(1.2.0) multiple sequence alignment SSR_ZA12_CONTIG_271_p1 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_269_p2 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_267_p3 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_261_p4 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_269_p5 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_228_p6 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_268_p7 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_259_p8 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_270_p9 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 SSR_ZA12_CONTIG_273_p10 TGGGTCTTGCATTCAGGCTATTCTCTCACCTGACAGGCTGCTCCAGGTACGGTTGATGTCCCGCAGTGCTTGTTTCTCAGCGATGGCTCTCTTGA 95 *********************************************************************************************** SSR_ZA12_CONTIG_271_p1 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGACGGTCTCTC--AGGGCCAGAAACTCACGCT 188 SSR_ZA12_CONTIG_269_p2 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGACGGTCTCA----GGGCCAGAAACTCACGCT 186 SSR_ZA12_CONTIG_267_p3 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGACGGT----C--AGGGCCAGAAACTCACGCT 184 SSR_ZA12_CONTIG_261_p4 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGAC------------GGCCAGAAACTCACGCT 178 SSR_ZA12_CONTIG_269_p5 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGACGGTCTCA----AGGCCAGAAACTCACGCT 186 SSR_ZA12_CONTIG_228_p6 TTTCGTCCAGCACCAGGTTGAGCTGAA---------------------------------------------ACTCAGGGCCAGAAACTCACGCT 145 SSR_ZA12_CONTIG_268_p7 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGACGGT---GC--AGGGCCAGAAACTCACGCT 185 SSR_ZA12_CONTIG_259_p8 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCA--------------GGGCCAGAAACTCACGCT 176 SSR_ZA12_CONTIG_270_p9 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGACGGTCTT-C--AGGGCCAGAAACTCACGCT 187 SSR_ZA12_CONTIG_273_p10 TTTCGTCCAGCACCAGGTTGAGCTCTTTGGAGCGTTTGAGGTTCTCTTGCTCAGCGGAGTGCAGACGGTCTCAGAAAGGGCCAGAAACTCACGCT 190 ************************ :: ***************** SSR_ZA12_CONTIG_271_p1 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 271 SSR_ZA12_CONTIG_269_p2 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 269 SSR_ZA12_CONTIG_267_p3 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 267 SSR_ZA12_CONTIG_261_p4 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 261 SSR_ZA12_CONTIG_269_p5 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 269 SSR_ZA12_CONTIG_228_p6 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 228 SSR_ZA12_CONTIG_268_p7 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 268 SSR_ZA12_CONTIG_259_p8 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 259 SSR_ZA12_CONTIG_270_p9 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 270 SSR_ZA12_CONTIG_273_p10 GGTAAATGTCCACCACATTACCTACAAGCAGAAGACACAGAACTCTATTAATTGTTGGTTTTAAAACCCACACTTGTATAGAA 273 ***********************************************************************************
a Coverage file:
This file, named SSR_987654w_ZA12_coverage.xlsx, is an EXCEL file providing depth and coverage information for each individual base, which is valuable for ascribing a sequence quality to the read at that position. If two potential contigs differ just by 1 to 3 SNPs, these two sequences will be consolidated. The positions of the possible SNPs are indicated in the Excel file, though most of the "SNPs" are obviously due to sequencing errors.
A descriptive example of a coverage file can be reviewed
here.
This current example is specific for our Complete Plasmid Sequencing service; the corresponding Complete Amplicon Sequencing file will look very similar.